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ABSTRACT 

Collisionless stellar systems are driven towards equilibrium by mixing of phase-space ele- 
ments. I show that the excess-mass function D(f) - ji^ ^ j,(F(x, v) - f) d?xd?v (with F(x, v) 

the coarse-grained distribution function) always decreases on mixing. D(f) gives the excess 
mass from values of F(x, v) > f. This novel form of the mixing theorem extends the maximum 
phase-space density argument to all values of /. The excess-mass function can be computed 
from -body simulations and is additive: the excess mass of a combination of non-overlapping 
systems is the sum of their individual D(f). I propose a novel interpretation for the coarse- 
grained distribution function, which avoids conceptual problems with the mixing theorem. 

As an example application, I show that for self-gravitating cusps (p oc r~ y as r — > 0) the 
excess mass D oc y- 2 ( 3 -r)/(6-y) as / — > oo, i.e. steeper cusps are less mixed than shallower ones, 
independent of the shape of surfaces of constant density or details of the distribution function 
(e.g. anisotropy). This property, together with the additivity of D(f) and the mixing theorem, 
implies that a merger remnant cannot have a cusp steeper than the steepest of its progenitors. 
Furthermore, I argue that the remnant's cusp should not be shallower either, implying that the 
steepest cusp always survives. 

Key words: stellar dynamics - methods: analytical - methods: statistical - galaxies: interac- 
tions - galaxies: haloes - galaxies: structure 



1 INTRODUCTION 

The dynamical state of a stellar system is completely described by 
its 'fine grained' distribution function, F(x, v, f), which refers to the 
phase-space density at point (x, v) and time t. The time evolution 
of the distribution function is governed by a continuity equation, 
known as the Vlasov or collisionless Boltzmann equation, 



d,F = d,F + v d x F - d x ® ■ d v F = 0. 



(1) 



Here, <t>(x) denotes the gravitational potential, which for a self- 
gravitating stellar system is given by the Poisson integral 



<5>(x, t) ■■ 



F(x',v,t) 
\x-x'\ 



dV d 3 v. 



(2) 



The main objective of galactic dynamics is to solve this system 
of equations. This is a difficult task and most analytic work is re- 
stricted to stationary or near-stationary solutions. For these a num- 
ber of theoretical concepts have been developed, such as Jeans' the- 
orem and perturbation theory. Galactic dynamics far from equilib- 
rium on the other hand, such as in a galaxy merger or collapse, 
are almost entirely treated with Af-body simulations, i.e. numerical 
solutions of equations (1) and (2). 

As emphasised already by Henon (1964) and Lynden-Bell 
(1967), the constancy of F(x,v, t) ensured by the collisionless 
Boltzmann equation (1) is of little practical use in non-equilibrium 
situations, because of mixing. Phase-space elements of high den- 
sity are stretched out and folded with elements of low density, very 



much as cream stirred into coffee. As for this example, the ele- 
ments become ever thinner until any measurement of F(x, v, t) be- 
comes impossible. In other words, the finite resolution of the sys- 
tem breaks the validity of the continuum limit. In such a case, the 
system is better described by a local average of F, known as the 
coarse-grained distribution function F(x, v, t). In fact, any mea- 
surement can only (hope to) recover this local average. 

There are important differences between mixing in a collision- 
less system such as a galaxy, and a collisional system such as a 
gas, where mixing is driven by short-range interactions. In galaxies, 
strong forms of mixing are caused by non-local large-scale dynam- 
ics and occur only away from equilibrium but generally promote 
equilibrium. Hence, mixing is never complete in the sense of con- 
vergence to a maximum-entropy state - in fact, it can be shown that 
such a state does not exist (Tremaine, Henon & Lynden-Bell 1986). 
A mild version of mixing is phase-mixing or similar 'weak mixing' 
processes, caused by secular evolution of stellar systems (for in- 
stance, the merging of regular orbits into a sea of chaos mixes their 
phase-space densities (Merritt & Valluri 1996)). Stronger forms of 
mixing are 'chaotic mixing' or 'violent relaxation', which is driven 
by large-scale fluctuations of the gravitational potential. 

A simple example of mixing is presented in Figure 1 : volumes 
of F = 1 (black) get stretched out (e.g. at t = 100) and folded to- 
gether with volumes of F = (white) until any distinction is barred 
by the finite resolution of any observation. So, while at t > 1000 
in this example the fine-grained distribution function F(x, v, t) dis- 
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Figure 1. Demonstration of phase mixing. Shown is the evolution of a bunch of ~ 10 s phase-space points in the ID Hamiltonian H = p 1 12 + \q\ of a point 
mass in ID gravity. The fine-grained distribution function is either equal to one (black) or zero (white), but at late times, a smooth distribution appears. 



plays a very complex pattern, equilibrium is reached in the coarse- 
grained sense, since d t F = 0. 

Unfortunately, F does not obey a simple continuity equation, 
such as (1), and hence describing its evolution is a considerable 
problem. Lynden-Bell (1967) derived a distribution function for the 
end-state of violent relaxation assuming the conservation of phase- 
space volumes of given density according to equation (1). However, 
mixing does not conserve volumes of fixed density in the coarse- 
grained sense (e.g. Mathur 1988) and, not surprisingly, the resulting 
theory is inconsistent (Arad & Lynden-Bell 2005). A similar at- 
tempt by Nakamura (2000) suffers from the same deficiency (Arad 
& Lynden-Bell 2005). Another approach to the dynamics of violent 
relaxation was taken by Chavanis (1998) in deriving a time evolu- 
tion equation for F. While this is a promising attempt, its practical- 
ity is limited and unlikely to surpass that of A'-body simulations. 

An obvious constraint on the evolution of F(x,v,t) is that 
its maximum value F max cannot increase. While this is applica- 
ble only if F(x,v) is initially bounded, a much stronger con- 
straint on F(x, v, t) is provided by a mixing theorem, a relation be- 
tween the properties of F(x,v) before and after a mixing process. 
Tremaine, Henon & Lynden-Bell (1986) considered //-functionals 
of F(x, v, t), which are defined as 1 

H[F] = - f c(F(x,v)) d 3 xd 3 v (3) 



The traditional definition in statistical mechanics differs by a sign. 



where C is a convex function with C(0) = 0. Tremaine et al. (see 
also Tolman 1938) showed that coarse-graining always increases 
//-functionals and concluded that mixing generally results in an 
increase of H[F], a result known as the 'H -theorem' 2 . 

According to Tremaine et al. (1986), a function F 2 is called 
more mixed than Fi, if for all //-functionals H[F 2 ] > H[Fi]. In 
particular, if F 2 originates from F] by mixing, F 2 is more mixed 
than Fi . The //-functional for C(f) = f In / is equal to (k B times) 
the entropy and even increases for collisional systems. Tremaine 
et al. prove a mixing theorem stating that F 2 is more mixed than Fj 
if and only if M 2 (V) < Mi(V)for all V, where the function M(V) 
is defined in terms of the cumulative volume and mass 

V(f) = f d 3 xd\ (4) 

JF(x,v)>f 

M(f) = f F(x,v)d?xd\ (5) 

Jp(xy»f 



2 This latter step, however, has been shown to be conceptually incorrect 
if one allows for arbitrary ways of coarse-graining (Soker 1996). Indeed, 
counter-examples to the mixing theorem can easily be constructed by tun- 
ing the coarse-graining (Kandrup 1987; Sridhar 1987). It seems that the 
problem arises from problems with the very concept of coarse-graining, 
which is lacking precise pre-conditions or even a precise definition. This 
does, however, not imply that a statement like the //-theorem cannot gen- 
erally be made or even that mixing was unimportant for stellar dynamics. I 
postpone a more detailed discussion of these issues to section 3. 
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via M{V(f)) = M(f). This is a stronger statement than that of 
the increase of entropy, since it holds for all values of V. Unfortu- 
nately, the usability of this theorem is restricted by the complicated 
definition of M(V), which evades a simple interpretation and ma- 
nipulation. For instance, A4(V) is not additive and, if V(f) is not 
invertible, it is not well defined, as is the case for the initial state of 
the example in Fig. 1, where V = 1 for / < 1 and V = for / > 1 
(known as the 'water-bag model'). 

The purpose of this paper is to present in section 2 a novel 
approach to mixing which is conceptually different from that of 
Tremaine et al. and avoids their conceptual problems. This leads 
to a novel form of the mixing theorem in terms of a new concept, 
the excess-mass function, which is simple to apply and easy to in- 
terprete. I discuss the relation to Tremaine et al.'s work and the 
controversy about it in section 3. In section 4, simple examples are 
given and the asymptotic behaviour at small and large values of F 
corresponding, respectively, to large and small radii are considered. 
These are applied in section 5 to the merging of cusped galaxies. 
Finally, section 6 discusses the applicability to A'-body simulations 
and section 7 concludes. 



2 MIXING 

In order to simplify the following discussion, it is worth introducing 
the 'volume distribution function' (e.g. Tremaine et al. 1986) 



(6) 



which refers to the phase-space volume at which F(x, v) = f. Using 
v(/), we can re-write the cumulative mass and volume and any H- 
functional as 



V(f) 



- r 



v(0)d0, 



M(f) 



H[F] 



(0)d0, 



v(<p)C(<p)i 



(7) 
(8) 
(9) 



Note that v(f), V(f), M(f), and M(V) are functionals of F(x,v) 
as well as functions of their parameter / or V. We can now model 
mixing directly as an operation on phase-space volumes. 



2.1 Infinitesimal mixing events 

The process of mixing and subsequent coarse-graining of the distri- 
bution function can be described as sequence of infinitesimal mix- 
ing events in which a phase-space element with infinitesimal vol- 
ume dVj and density fi mixes completely with another volume dl4 
having density /i, > fi. Because of conservation of mass and of 
phase-space volume, the resulting element has volume AV\ + dV h 
and density 

dVi/i + dV h / h 



dV, + dV h 



(10) 



(Mathur 1988). The change in the volume distribution function due 
to such an event is 

dv(/) = (dV, + dV h ) S(f - f m ) - dV, S(f - fi) - dV h S(f - f h ) (11) 

Mixing events with /] = f h do not affect v(f) and hence may be 
called 'adiabatic'. 



7(0) 



7(f) 




Figure 2. In a plot of cumulative volume V(<f>) vs. phase-space density 0, the 
excess-mass D(f) is given by the horizontally shaded region (equation 14), 
which equals M(f) (total shaded region) minus fV(f) (vertically shaded; 
equation 15). 



2.2 A lemma on mixing 

Consider the following function 



D(f)= J (F(x,v)-f)d 3 xd\, 

F(x,v)>f 

which may be re- written as 



D(f) 



-r 
■r 



(0-/)v(0)d0 



M(f)-fViJ). 



(12) 



(13) 

(14) 
(15) 



As is obvious from these relations, the excess-mass function D(f ) 
refers to the excess mass due to values of F > f (see also Fig. 2). 

Mixing lemma. Mixing of phase-space volumes where F(x,v) < 
f with volumes where F(x, v) > f decreases D(f); other mixing 
processes leave D(f) unchanged. 

Proof: First, consider an infinitesimal mixing event. The changes it 
imposes on £>(/) are easily found from equations (11) and (13): 



AD{f) 





-(f-fddV, 
-(/h-/)dV h 




for / < /,, 

for fi < f < fm, 

for f m < f < / h , 

for A < /■ 



(16) 



Thus, dD(f) < for fi < f < f b and otherwise. Since the 
whole mixing process is a sequence of infinitesimal mixing events, 
the change in D(f ) is the integral over many infinitesimal changes 
dD( f) and the lemma follows. 

The largest change of D(f) due to an infinitesimal mixing 
event occurs at f=f m and is dD(f m ) = -|/ h -/i|dVi dV h /(dVi + dV h )- 



2.3 Further properties of the excess-mass function 

Apart from the relations (13), (15), and (14), the function D(f ) has 
the following properties. First, 



»'(/) 
D"(f) 



-V(f), 
v(/). 



(17) 
(18) 



Since both v(f) and V(f) are non-negative, this implies that D(f) 
is non-negative and monotonically declining with everywhere non- 
negative curvature. 



4 Walter Dehnen 



Second, since in order for M(f — > oo) not to diverge f 2 v(f) ■ 
as / — > oo, and because of equation (18), 



/■ 



lim D(f) = 0. 



Third, for a system with finite mass, 



D(0) = M total . 



(19) 



(20) 



Fourth, changes in D( f) are related to a change of the entropy 



5 via 



AS 



JftiX 




Wf) 
f 



df, 



(21) 



which becomes obvious at the end of the next section. 

Finally, the combined excess-mass function of several disjoint 
systems (whose distribution functions do not overlap) is simply 
given by the sum of the individual excess-mass functions. In gen- 
eral, i.e. for partially overlapping systems, the excess-mass function 
is super-additive: 



A +2 (/) > £»i(/) + D 2 (f). 



(22) 



This is directly related to the sub-additivity of the entropy (e.g. 
Wehrl 1978) and follows from the definition (12) of D(f) and the 
fact that F U2 = F x +F 2 . 



Conversely, suppose D 2 (f) < Di(f) for all /. From equation 



(9), 



H 2 -Hi=- f 
Jo 



(vz-vOCd/. 



(25) 



Integrating by parts and using V = —v yields 

H 2 -H x = (V 2 - V,)CH - f (V 2 - V,)C" df. 
10 Jo 

The first term on the right-hand side vanishes, and integrating the 
second term by parts using (17) gives 



(26) 



H 2 -H l= (D : 



- Di)c iM 



(D 2 -D l )C"df. 



(27) 



Again the first term on the right-hand side vanishes; the second 
term is non-negative, since D 2 < D t by assumption and C" > 
by definition of convexity. Hence H 2 > Hi, which completes the 
proof. 

Together with the above lemma, this theorem is another proof 
of the //-theorem (mixing increases or preserves but never de- 
creases any //-functional). 

Since the entropy 5 is the //-functional with C(f) = / In/, the 
relation (21) between changes in entropy and D(f ) follows directly 
from equation (27). 



2.4 A mixing theorem 

The above lemma is closely related to a statement made by Mathur 
(1988). In fact, his function P(f), for which he only gives the sec- 
ond derivative, is identical to the change in D(f) induced by mixing 
and his equation (7) is equivalent to my (16). 

The relation to the theorem given by Tremaine et al. (1986) 
and outlined in the beginning of this section is more subtle. In the 
proof of their theorem, Tremaine et al. construct the function D(f), 
because it actually is (the negative of) an //-functional of F . In 
fact, as pointed out by Mathur (1988), D(f) and M(V) are related 
by a Legendre transformation 3 , as is obvious from equation (15). 
In particular, M'(V) = f, M"(V) = l/v(/), and D' (equation 17) 
is the (negative of the) inverse of M ', by definition of a Legendre 
transform. This is directly related to the fact that 



tdM\ 
\dr) v 



(23) 



where t is a time-like variable describing the evolution due to 
mixing. Equation (23) in conjunction with the theorem given by 
Tremaine et al. (1986) implies the following alternative form of the 
mixing theorem. 

Mixing theorem. The distribution function F 2 (x, v) is more mixed 
than Fi(x,v) if and only ifD 2 (f) < D x (f)for all f. 



2.5 Diluting phase-space density 

The theorem above is more useful than the equivalent theorem by 
Tremaine et al. (1986) because the function D(f) is easier to com- 
prehend and manipulate than A4(V). In particular the additivity of 
D(f) is of great value. However, the lemma of §2.2 is of even larger 
practical significance, because it allows us to relate the change in 
D{f) directly to mixing events 'across' F = f. 

The total change of D{f) in a mixing process may be obtained 
by integrating the infinitesimal change (16) over a function which 
specifies for each pair (/i,/h) how much phase space at f mixes 
with how much phase space at / h (Mathur 1988). However, it is 
not clear how such a function may be obtained; moreover, in the 
end the information contained in this function is reduced to the 
one-dimensional change in D(f). One may instead assume a simple 
form for this function, resulting in simple mixing models 

For instance, one may assume that all of v(f) gets mixed with 
an empty volume of size a(f)v{f). This complete 'mixing with air' 
simply dilutes the phase-space density F — > F/[l +a(F)] and gives 



AinalC/) = Anitial([l + «(/)]/)• 



(28) 



The dilution function a(f) is always well-defined and can be mea- 
sured directly from A'-body experiments, by estimating D(f) be- 
fore and after a violent mixing process. Essentially a(f) gives the 
equivalent amount of mixing with air necessary to generate a cer- 
tain evolution of D{f). 



Proof: Suppose F 2 is more mixed than F\. Then D 2 (f) < -Di(/), 
since £>(/) is the negative of a //-functional of F(x,v) with the 
convex function 



C(F) 



for 
for 



F<f, 
F>f. 



(24) 



3 Mathur failed to derive D(f) itself, but based his statement on equation 
(18), which he used as definition; also he strangely considered negative val- 
ues for /. 



3 COARSE-GRAINING 
3.1 Conceptual problems 

As mentioned in footnote 2 above, the //-theorem of Tremaine 
et al. (1986) has met immediate rejection (Dejonghe 1987; Kan- 
drup 1987; Sridhar 1987). These authors provided simple counter- 
examples of non-mixing systems whose //-functionals are not con- 
served or even decreasing and pointed to the following concep- 
tual problem in the argumentation. Tremaine et al. have actually 
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only proven that //-functionals increase as consequence of coarse- 
graining: H[F(x, v)] > H[F(x,v)] independent of the actual dy- 
namics or indeed mixing. From that, they argued that initially 
F = F (which can be guaranteed by definition of the arrow of time, 
Tremaine, private communication), but F + F at a later time f, and 
hence H[F(0)] < H[F(t)]. However, this argument does not guar- 
antee that H[F(0)] < H[F(h)] < H[F(t)] at intermediate times t v 

Unfortunately, this controversy undermined the whole sub- 
ject of mixing in stellar dynamics. Some sceptics argue that the 
fine-grained distribution function never suffers information loss and 
hence that the //-theorem is a pure artifact of coarse-graining (im- 
plying that the entropy of a collisionless stellar system is constant). 
This argument, however, relies on the infinite resolution of the fine- 
grained distribution function and ignores the fact that no stellar sys- 
tem can support infinite resolution. The fine-grained distribution 
function and the CBE only give an approximative description of 
collisionless stellar dynamics (e.g. Dejonghe 1987). In the presence 
of mixing, the continuum limit, on which this approximation rests, 
becomes invalid. Mixing is an irreversible process as information 
about the state prior to mixing is lost, representing a true entropy 
increase in the sense understood by Boltzmann (Merritt 1999). 

The conceptual problems related to the //-theorem originate 
from the fact that the details of and requirements for the coarse- 
graining operation are not specified and hence are usually consid- 
ered unimportant. Many authors consider a static coarse-graining 
operation, such as averaging over time-independent macro cells 
or convolution with a window function. For such ways of coarse- 
graining Soker (1996) showed that H[F] does not obey a H- 
theorem 4 . The reason is easily understood when considering a non- 
mixing non-equilibrium (e.g. periodic) system. Since the system 
evolves, the effectively resolved mass per fixed macro cell evolves 
too, so that H[F] is not necessarily conserved. 

3.2 A novel interpretation of coarse-graining 

In this situation, it is instructive to consider the proof of the H- 
theorem from the previous section. Unlike Tremaine et al.'s proof, 
it does not employ coarse-graining with finite macro cells. Rather 
mixing is described directly as (integral over) averaging of in- 
finitesimal phase-space volumes. In this description, the astrophys- 
ical process of mixing (and the resulting loss of information) is 
accounted for, in our description of the system, by a local aver- 
aging, the coarse-graining. Thus, mixing and coarse-graining are 
intimately related and the latter must not be considered arbitrary. 

This is directly related to the interpretation of the coarse- 
grained distribution function F. Traditionally, F is introduced, be- 
cause its fine-grained pendant F does not tend to equilibrium, 
but undergoes ever stronger small-scale fluctuations (e.g. Chava- 
nis 1998). In this picture, F gives an otherwise unspecified, finite- 
resolution representation of the system. As already mentioned, the 
fine-grained fluctuations of F will eventually break the validity of 
the continuum approximation, i.e. below some level, these fluctua- 
tions are artificial and not representative of the actual stellar system. 

These arguments suggest the interpretation of F as our best 
possible description of the stellar system, avoiding the artifacts of 
its fine-grained counterpart. In this interpretation, coarse-graining 
must meet the following conditions. 

4 As a simple example consider F to be the sum of two (^-functions. If the 
two points are close enough to be within one macro cell, 

/"max is twice as 

large than otherwise. 
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Figure 3. The excess-mass function D(f) for the (non-singular) isothermal 
sphere. 

1. In the limit of a system with infinite resolution: F — > F. 

2. F must be a faithful representation of the system. 

3. In the absence of mixing: d,F = 0. 

The first two conditions imply that coarse-graining must be local. 
The second condition ensures that coarse-graining only deletes in- 
formation in F but not in our description of the stellar system, in 
particular F = F. This means that coarse-graining is done on the lo- 
cal resolution scale of the system itself, which seems the most nat- 
ural scale to use, but excludes static coarse-graining. In particular, 
any moment of the stellar system must agree (within its statistical 
uncertainty) with the corresponding moment of F. Finally, the last 
condition guarantees that F is altered by mixing only, i.e. under or- 
dinary non-mixing circumstances all information about the system 
is preserved in F. This immediately warrants the //-theorem. 

It is not clear, whether and how these conditions can be met in 
a practical implementation. However, even if we could only gener- 
ate an approximation to this ideal, the above conditions and the 
underpinning interpretation were still valuable. For instance, we 
would allow for an approximation error F — F (with • denoting 
approximation), which in turn might result in spurious but account- 
able violations of the //-theorem. Clearly, a more detailed investi- 
gation of these issues is beyond the scope of this paper. 

We should stress that the idea of F(x,v, t) being the best pos- 
sible description of the stellar system is not necessarily consistent 
with other approaches. For instance, Chavanis (1998, see also Cha- 
vanis & Bouchet 2005) consider F to contain a truly reduced in- 
formation and, hence, the fluctuations of F to be (at least partially) 
real rather than entirely artificial. It may be possible to reconcile 
this with our ideas by altering the above conditions to allow for a 
arbitrary resolution (in terms of the mass or number of stars per res- 
olution element) without spoiling the //-theorem - however, then 
d,F + because of the forces generated by the fluctuations (Cha- 
vanis 1998). 



4 EXAMPLES 

4.1 Asymptotics at large / 

4.1.1 Density cores: limited phase-space densities 

Let us consider the (non-singular) isothermal sphere (e.g., Bin- 
ney & Tremaine 1987). I have numerically obtained the potential 
<5>(r) as well as the phase-space volume g(E) at constant energy. 
The volume distribution function is then given by g(E)/\dF/dE\ 
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and D(f) can be computed using equation (13). The result is plot- 
ted in Figure 3. The mass of the isothermal sphere is infinite and 
hence D — > oo as / — > 0. The maximum phase-space density is 
/max = 2po/[9(27rcr) 3/2 ], where a and p denote the velocity disper- 
sion and central density, respectively. At / « f mllx , the excess-mass 
function decays like D oc (/ max - /) 4 . 



4 ./. 2 Density cusps: unlimited phase-space densities 5 

Let us consider a self-gravitating stellar system whose density at 
small radii is given by a power law in radius, corresponding to a 

density cusp 5 



p{x) = p g(x) x 



(29) 



with x = x/\x\ the unit vector in direction of x and the dimension- 
less radius x = |je|/r u , where r u is the unit of length. The param- 
eters po and < y < 3 are, respectively, a density normalisation 
and cusp strength. The dimensionless and continuous function g(x) 
determines the shape of surfaces of equal density. I proceed by as- 
suming that the distribution function is of the form 

F=p v 3 h(X)f(E) (30) 

with the constant vo given in equation (A2). Here E = v 2 /2 + <b(x) 
is the energy, while X = X(x, v) denotes a set of scale-invariant 
integrals of motion. Scale invariance in this case means that 



X(x,v) = X(ax,a l -? ll v) 



(3D 



for any dimensionless scale factor a. Examples for scale-invariant 
properties of stellar orbits are the eccentricity and ratios between 
orbital frequencies or actions. After some algebra (see appendix A), 
the excess-mass of the self-gravitating cusp function is found to be 



D(f) = D yl ,rlpo(f 2 G 3 p r 6 u y^. 



(32) 



Here, D yh is a dimensionless constant given in equation (A13), 
which for the spherical (jo = 1) and isotropic (h = 1) case reduces 
to a simple expression (see appendix A). 

The exponent in D oc f- 2 0-y)l(6-7l var i es nly between 1 for 
7 — > and for y — > 3 and decreases with increasing y, i.e. a 
steeper cusp is less mixed than a shallower one. Moreover, this 
asymptotic behaviour of D{f) depends only on the cusp strength 
and not on the details of the density contours or the distribution 
function, as long as it is scale-invariant. 

For a non-self-gravitating system with density oc r~ y immersed 
in a gravitational field generated by an overall mass density p oc 
with B>y, one finds by a similar analysis 



Doc/ 



-2(3- r )/(6+2y-3/3) 



(33) 



4. 1.3 Stellar systems dominated by a super-massive black hole 

The case of a stellar system whose dynamics is dominated by a 
super-massive black hole corresponds to B = 3 in equation (33), 
i.e. gives D oc y-2(3-y)/(2 r -3) Note t jj at ^ phase-space density of 
such a system is unlimited only for y > 3/2. The exponents in this 
relation vary between oo for y — > 3/2 and for y — > 3. Thus, again 
steeper cusps are less mixed, but the differences are much more 
pronounced than for self-gravitating cusps. 

5 For any real system, the resolution is of course finite and hence, the den- 
sity limited. However, we assume here that the resolution is high enough 
for the asymptotic limit to be useful over a range of densities. 



4.2 Asymptotics at small / 

Next, consider a stellar system of finite mass. For simplicity, I con- 
sider the case of spherical density only. The potential in the outer 
parts is dominated by the monopole, i.e. O = -GM/r, while the 
density is assumed to be of the form p = po(r/r u r'> with n > 3 for 
the mass not to diverge at r — > oo. A distribution function of the 
form F oc L~ 2fl f(E) always generates a constant Binney anisotropy 
B = \ -o-\ja 2 (Cuddeford 1991). For the case considered here, this 
gives 

F = po (GM/r a y 3 ' 2 f# X-P S"- 312 , (34) 

where£ = -r u E/GM,X = L 2 /L 2 a JE) = 1-e 2 (orbital circularity), 
andf^ 1 = 2 3l2 nB(\,\-P)B{\-fi,r]-\-P). The phase-space volume 
at fixed (E,X) 



g{E,X)= ^GMr^S- 512 (35) 
is independent of X. From these, one can obtain v(f) and 

(f 2 G 3 M 3 



D(f) = M D - Po r 3 u D rll 



1-3 



3 2 

Kp 2 



(36) 



(37) 



with M D < M the total mass of the density component considered 
and 

_ 2^(2,7 -3) 2 3/(2,-3) 
* 3(77- 3X277 - 3 - 3/3) * 

Hence, again the asymptotic is independent of details like the or- 
bital anisotropy. 



4.3 Excess-mass functions of the y models 

Figure 4 shows the excess-mass function of the spherical y-models 
(Dehnen 1993; Tremaine et al. 1994, with M and a denoting total 
mass and scale radius) 

3-y Ma 



2n ry(r + a) 4 -7 



(38) 



with isotropic velocity distribution. Evidently, D t < D 2 for 71 < y 2 , 
thus 7-models are less mixed with increasing 7. The bottom panel 
shows D/(l - D/M tot ) and enables to better distinguish between the 
excess-mass functions at / — > 0. The line shows the asymptotic 
slope predicted by equation (36). 



5 APPLICATION: MERGING CUSPS 

As application, consider the merging of several cusped galaxies or 
dark-matter haloes. Because of its additivity the combined D(f) 
prior to the merger is equal to the sum of those of the progenitors. 
When equilibrium has been re-established after the merger, D r (f) 
of the merger remnant must satisfy 

A(/)«£>z(/) = XA(/). (39) 



5.1 Constraints on the cusp strength 

Let us first consider the combined excess-mass function of the 
progenitors on the right-hand side of (39). Each /?,(/) is of the 
form (32): D t oc y-2( 3 -r;)/< 6 -ri). Thus, at sufficiently large phase- 
space densities £>j; will be dominated by the steepest progenitor 
cusp. Next suppose the remnant also forms a scale-free cusp, such 
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limit considered). Together with equation (32) and the additivity, 
this yields 




-8 -6 -4 -2 2 4 6 8 

Figure 4. Top: The excess-mass function D(f) for spherical and isotropic 
y models, which have density p oc r~ y (r + a) r ~ 4 . The curves correspond to 
values of y from 0.25 (lowest) to 2.5 (uppermost) spaced by 0.25 with y = 1 
and y = 2 plotted solid. Units correspond to G, a and total mass equal to 
unity. Bottom: The same, but plotting D/(l - D/M tot ), which distinguishes 
the models at / -» and demonstrates that in this limit they indeed behave 
as equation (36) predicts for r] = 4. 



that its excess-mass function D r oc y- 2 < 3 -rr)/(6-rr)_ Then for con- 
dition (39) to be satisfied y r < max,(y,j. Thus, the remnant cusp 
cannot be steeper than any of the progenitor cusps. 

In other words, steeper cusps are less mixed than shallower 
cusps (in the limit / — > oo) regardless of details such as the shape 
of the density contours or the distribution of orbital shapes (orbital 
anisotropy), as long as these are the same for all radii and/or en- 
ergies (scale freedom). This implies that by virtue of the mixing 
theorem mergers cannot produce cusps steeper than those already 
present in their progenitors. Conversely, a remnant cusp shallower 
than the steepest of its progenitors would require an arbitrarily large 
dilution a of the distribution function as / — > oo. This, while not 
impossible, seems highly implausible, which strongly suggests that 
the remnant cusp should not be shallower than the steepest of its 
progenitors. Together with the above mixing constraint, this means 
that the maximum cusp strength is conserved when merging colli- 
sionless stellar systems. 



5.2 Constraints on the cusp mass 

Let me exemplify the merging of two equal and cusped galaxies a 
little more. I assume that the remnant has the same scale-invariant 
structure and cusp strength y as its progenitors, but different den- 
sity normalisation, p 0r £ po p - Under these circumstances, the dilu- 
tion function a(f) of equation (28) is constant (in the asymptotic 



Por - Pop 



6-y 3 

2~^" (1 + a) TB=fS. 



(40) 



Thus, for por = 2p 0p (i.e. the remnant cusp containing the sum 
of the progenitor-cusp masses), the dilution fraction has to be 
a = V5 - 1 « 41% independent of y. For the remnant cusp to 
be equally massive as either of its progenitors, a = 2 (6 ~' ,)/2<3 ~ r) - 1, 
which evaluates to 1 for y = 0, to « 138% for y = 1, to « 183% 
for y = 1.5, and 3 for y = 2. Thus, if a is not strongly dependent 
on y, relation (40) suggests that remnants of steep-cusp mergers 
have more massive cusps, compared to their progenitors, than rem- 
nants of shallow-cusp mergers. However, since steeper cusps gen- 
erate stronger tides, one would indeed expect a to depend on y in a 
sense opposing the above trend. 



6 APPLICATION TO /V-BODY SIMULATIONS 

One of the motivations of this study was the hope to use the excess- 
mass function as a diagnostic tool in the interpretation and valida- 
tion of W-body experiments. To this end it is necessary to estimate 
phase-space densities from A'-body data. Arad, Dekel & Klypin 
(2004) and Ascasibar & Binney (2005) have demonstrated in two 
pioneering studies that this can in principle be done, even though it 
is a difficult task, because of (i) the vastness of 6D phase space and 
(ii) the lack of a metric. 

Once estimates F t for F at the phase-space positions of the 
bodies have been obtained, the excess-mass function may be esti- 
mated as 

D(/)=^>,-/£?, (41) 



Fi>f 



F t >f 



F, 



where ra, denotes the mass of the ith body. A serious problem in 
this game is that the coarse-graining scale used in the estimation of 
F may well be larger than the scales still resolved by the system. 
As discussed in section 3, this approximation error may result in 
unwanted surprises (such as D increasing with time). 

Figure 5 demonstrates these problems when applying the 
phase-space density estimator FiEstAS of Ascasibar & Binney 
(2005) to 10 6 points drawn from a Hernquist (1990) sphere. The top 
two panels show that the estimated density can be up to two orders 
of magnitude wrong and is systematically too large for / < 0.01, 
while high densities are truncated, in particular when smoothing 
is used (top panel). The bottom panel shows the estimate D(f) ob- 
tained from equation (41) and the FiEstAS estimated densities with 
(short dashed) and without (long dashed) smoothing. A compari- 
son with the true D(f) (solid) shows that the estimated D(f) is only 
slightly too large at intermediate /, but is seriously in error at large 
/ (and also at very small /), in particular when smoothing was 
used. These errors are obviously related to the underestimation of 
large phase-space densities. This underestimation occurs at values 
for F below those expected from the finite resolution with N = 10 6 , 
since M(/)/M tot = 10~ 6 at / * 10 4 - 7 and 10~ 5 at / « 10 3 5 . 

Also plotted are the estimates from equation (41) using the 
true fine-grained phase-space densities of 10 6 bodies (dotted). 
These can hardly be distinguished from the true D(f), indicating 
that equation (41) gives a fairly good estimator provided the esti- 
mates Fj are good. 

These results suggest that, in order to resolve the asymptotic 
behaviour of D(f ) at / — > oo, at least ~ 10 7 points are required 
with this technique. Clearly, there must be ways to improve the 
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Figure 5. Assessing the fitness of FiEstAS for estimating D(f). Top: error 
in the FiEstAS estimates F,- for F from N = 10 6 points drawn from a Hern- 
quist (1990) sphere (y = 1 in Fig. 4) plotted versus their true phase-space 
density (only every 25th body plotted). Middle: the same, but using FiEs- 
tAS without smoothing. Bottom: Excess-mass function of the same model 
(solid), and estimates for D(f) using equation (41) and the true fine-grained 
F of N = 10 6 points (dotted), or the FiEstAS estimates F, for the same 
points with (long dashed) and without smoothing (short dashed). 

situation. Apart from improvements in the existing technique, one 
may exploit that d,F = (assuming no mixing is going on) in order 
to constrain the possible values for F,. 



In section 2.4, I prove a novel form of the mixing theorem 
(Tremaine, Henon & Lynden-Bell 1986), stating that D(f) de- 
creases if and only if any //-functional of the distribution function 
increases. My lemma together with this theorem is an alternative 
proof of the //-theorem (the increase of //-functionals due to mix- 
ing), avoiding some conceptual problems associated with allowing 
arbitrary coarse-graining. 

In section 3, the importance of details of the coarse-graining 
operator for the validity of the //-theorem are discussed. It is ar- 
gued that the conceptual problems of Tremaine et al.'s proof of this 
theorem can be avoided by requiring appropriate coarse-graining. 
In particular, I propose an interpretation of the coarse-grained dis- 
tribution function F as the best possible description of the stellar 
system. In this interpretation, the astrophysical process of mixing 
is directly described by coarse-graining and in the absence of mix- 
ing d,F = 0, such that the //-theorem is guaranteed. 

In section 4, D(f) for some simple spherical equilibria is 
given and its asymptotic behaviour at small and large / considered. 
For equilibria with a self-gravitating scale-invariant density cusp 
(p oc r~ y ), the asymptotic behaviour at / — > <x> is D oc y-2(3-y)/(6-y) 
independent of the shape of the density contours and details of the 
distribution function. This remarkable property together with the 
additivity and the mixing theorem allowed me in section 5 to prove 
that a merger remnant cannot have a density cusp steeper than any 
of its progenitors. Assuming that mixing during the merger does 
not become ever stronger at higher values of F (which cannot be 
strictly excluded, but appears highly implausible), one can show 
that the maximum cusp strength is conserved, i.e. the remnant cusp 
has strength y equal to the maximum of its progenitors. 

Clearly, the decreasing nature of the excess-mass function is 
not restricted to galactic dynamics, but applicable to any collision- 
less system undergoing mixing. For instance, the inequality con- 
straint used by Yu & Tremaine (2002, eq. 33) to describe the evo- 
lution of the population of super-massive black holes is essentially 
equivalent to the mixing lemma. In this case, merging of super- 
massive black holes mixes the distribution of their properties. 



7 SUMMARY AND CONCLUSION 

A stellar system out of equilibrium is driven towards equilibrium 
by way of mixing its phase-space densities in a process of violent 
relaxation. As a consequence the concept of the fine-grained distri- 
bution function F(x,v) is ill-suited to understand non-equilibrium 
stellar dynamics. Instead, the system is better described in terms of 
its coarse-grained distribution function F(x, v). Mixing of phase- 
space elements changes F in such a way that the excess-mass func- 
tion 

F(x,v)>f 

decreases (mixing lemma, section 2.2), equivalent to a statement by 
Mathur (1988). In fact, only events which mix densities > / with 
densities < / decrease D(f). This lemma may be considered an ex- 
tension of the well-known maximum phase-space density argument 
to all density values. D(f) measures the excess mass due to phase- 
space densities higher than / and its decrease is directly related to 
entropy increase, see equation (21). A useful property of D(f) is its 
additivity: the excess-mass function of the combination of disjoint 
stellar systems (which do not overlap in phase space) is simply the 
sum of the individual D(f). 
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APPENDIX A: D(f) FOR DENSITY CUSPS 

Here, we derive the excess-mass function for stellar systems with 
a power-law density (29) and a scale-invariant distribution function 
of the form (30). 

The gravitational potential generated by the density (29) is 



*(*): 



for 7^2, 



2-7 

In + lnx for 7 = 2 



with 



2 _ 



4nGp rl 
3-7 ' 



(Al) 



(A2) 



where G denotes Newton's constant of gravity. Here, \f/(x) is a di- 
mensionless shape function which is uniquely determined by g(x) 
and 7 through Poisson's equation, giving 

— ) ifr for 7 5t 2, 

(2-7X3-7V (A3) 

(1 + A n )ln<fr for 7 = 2 







with A n = (sm6y l dg sin#<9 fl + (sin£0~ 2 <9^ the angular part of the 
Laplace operator. Naturally, for the sperical case g=\ = \f/. 

The assumed functional form (30) for F means that the distri- 
bution of scale invariant orbital properties is the same for all ener- 
gies and is determined by the function h(X). For self-consistency, 
F must generate the density (29), which uniquely determines the 
energy dependence, yielding 



F=p !) v H yh h{X)xf-K )l1 
with ty h a normalisation constant and 



Xe 



/ £\ 1/(2-7) 

(2-7)- 

v o 

E 

exp — 



for 7^2, 
for 7 = 2. 



(A4) 



(A5) 



Inserting this into the self-consistency constraint p(x) 
one finds after some algebra 



£>(*) 
with 



= t yh f 



d\v h(X(r n x, v w)) w)~ 



<f(x) 



for 
for 



7*2, 
7 = 2. 



/ Fd\, 



(A6) 



(A7) 



The integral in equation (A6) is over all w-space for 7 < 2 and 
restricted to w 2 /2 < t//(x)/(y - 2) for 7 > 2. Equation (A6) is the 
self-consistency constraint for the function h(X) and determines the 
constant f 7 *. For the spherical case (g = 1) with isotropic veloc- 
ity distribution (h = 1) equation (A6) yields (with B(x,y) the beta 
function) 



(2-7) 



3/2 



2 5/2 ^(f,^) 



^"3/2 



(7-2) 



3/2 



25/2^(2,^) 



for 

for 
for 



7 < 2, 

7 = 2, 
7 > 2, 



(A8) 



which is continuous in 7. 

Next, the phase-space volume g(E, X) at fixed E and X is ob- 
tained by integrating 6(E-v 2 /2-<&(x))S(X -X(x, v)) over all phase 
space. By exploiting the scale-invariance of X one obtains after a 
little algebra 



Sy( x o)= ^-rS(X Q - X(,r u x,v w)), 



(A9) 



(A10) 



g{E,X) = ^v Q g y {X)xf^ 2 
with 

d^c dSv 

where Jd 2 Jc denotes the integral over the sphere, while the inte- 
gral d 3 w is over the same volume as in equation (A6). For the 
spherical case, the phase-space volume at fixed energy g(E) may 
be obtained by integrating g(E, X) over all X, which gives g{E) = 
r 3 u v g r x (% E y)l2 with 



g y = (4*) 2 V2 



(2-7) 3 / 2 
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for 
for 
for 



7 < 2, 
7 = 2, 
7 > 2, 



(All) 



(7 - 2)3/ 2 

which is continuous in 7. 

Now, one can compute the volume distribution function as 



v(f) 



g(E,X)S(f-F(E,X))dEdX 



(A12) 



and derive the excess-mass function via equation (13) to be of the 
form (32) with the dimensionless constant 



D 



(6-7)f 



yh 



hi 

6-y 

yh 



3(4-r) 



, g i.Vl/d.Vi cl.V. (A13) 

3(3 -7)(4-7) \3 -yj 

For the spherical (g = 1) and isotropic (h = 1) case, the integral 
in this equation is just the constant g r of equation (Al 1) and i Y h is 
given by equation (A8). 



